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Abstract 

We study convex relaxations of the image labeling problem on a con- 
tinuous domain with regularizers based on metric interaction potentials. 
The generic framework ensures existence of minimizers and covers a wide 
range of relaxations of the originally combinatorial problem. We focus on 
two specific relaxations that differ in flexibility and simplicity - one can be 
used to tightly relax any metric interaction potential, while the other one 
only covers Euclidean metrics but requires less computational effort. For 
solving the nonsmooth discretized problem, we propose a globally conver- 
gent Douglas-Rachford scheme, and show that a sequence of dual iterates 
can be recovered in order to provide a posteriori optimality bounds. In 
a quantitative comparison to two other first-order methods, the approach 
shows competitive performance on synthetical and real-world images. By 
combining the method with an improved binarization technique for non- 
standard potentials, we were able to routinely recover discrete solutions 
within l%-5% of the global optimum for the combinatorial image labeling 
problem. 



1 Problem Formulation 

The multi-class image labeling problem consists in finding, for each pixel x in 
the image domain C M'^, a label £{x) G {!,...,/} which assigns one of I class 
labels to x so that the labeling function £ adheres to some local data fidelity as 
well as nonlocal spatial coherency constraints. 

This problem class occurs in many applications, such as segmentation, mul- 
tiview reconstruction, stitching, and inpainting [PCF06] . We consider the vari- 
ational formulation 

inf f{£), [ s{x,£{x))dx+ J{£). (1) 

£:0-i.{l,...,i} ^-^^ 

V rcgularizcr 
data term 

The data term assigns to each possible label i{x) a local cost s{x,£{x)) G M, 
while the regularizer J enforces the desired spatial coherency. We will in partic- 
ular be interested in regularizers that penalize the weighted length of interfaces 
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Figure 1: Convex relaxation of the multiclass labeling problem. The assignment 
of one unique label to each point in the image domain $7 is represented by a 
vector- valued function u : — K'. Ideally, u partitions the image into I sets by 
assuming one of the unit vectors {e^, . . . , e'} everywhere. By relaxing this set 
to the standard (probability) simplex A;, the originally combinatorial problem 
can be treated in a convex framework. 



between regions of constant labeling. Minimizing / is an inherently combina- 
torial problem, as there is a discrete decision to be made for each point in the 
image. 

In the fully discrete setting, the problem can be expressed in terms of a 
Markov Random Field |Win06| with a discrete state space, where the data 
and regularization terms can be thought of as unary and binary potentials, 
respectively. For graph-based discretizations of J, the resulting objective only 
contains terms that depend on the labels at one or two points, and the problem 
can be approached with fast graph cut-based methods. Unfortunately, this 
scheme introduces an anisotropy |Boy03| and thus does not represent isotropic 
regularizers well. Using ternary or higher-order terms, the metrication error can 
be reduced, but graph-based methods then cannot be directly applied. 

However, it can be shown that even in the graph-based representation the 
problem is NP-hard for relatively simple J [BVZ01| . so we cannot expect to 
easily derive fast solvers for this problem. This is in part caused by the discrete 
nature of the feasible set. In the following, we will relax this set. This allows to 
solve the problem in a globally optimal way using convex optimization methods. 
On the downside, we cannot expect the relaxation to be exact for any problem 
instance, i.e. we might get non-discrete (or discrete but suboptimal) solutions. 

There are several choices for the relaxation method, of which in our opinion 
the following is the most transparent (Fig. [T]): Identify label i with the «-th unit 
vector e* G K', set E := {e^, . . . , e'}, and solve 

inf f{u) , f{u) := / {u{x), s{x))dx + J{u) . (2) 

The data term is now linear in u and fully described by the vector 

s{x) := (si(x), . . . , Si(x))^ := {s{x, 1), . . . , s(x, l))" . (3) 

Due to the linearization, the local costs s may be arbitrarily complicated, pos- 
sibly derived from a probabilistic model, without affecting the overall problem 
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class. In this form, we relax the label set by allowing u to take "intermedi- 
ate" values in the convex hull convi? of the original label set. This is just the 
standard simplex A;, 

I 

Ai conv{e\ . . . ,e'} = {a e M'|a ^ 0,^a, = 1} . (4) 

2 = 1 

The problem is then considered on the relaxed feasible set C, 

C := {u e BY{ny\u{x) e A; for a.e. x e fl} . (5) 

The space of functions of bounded variation BV(il)' C {L^Y guarantees a mini- 
mal regularity of the discontinuities of u, see Sect. |2.2"l Assuming we can extend 
the regularizer J to the whole relaxed set C, we get the relaxed problem 



inf fiu) , f{u) := / {u{x), s{x))dx + J{u) . (6) 

If J can be made convex, the overall problem is convex as well, and it may likely 
be computationally tractable. In addition, J should ideally have a closed-form 
expression, or at least lead to a computationally tractable problem. 

Whether these points are satisfied depends on the way a given regularizer is 
extended to the relaxed set. The prototypical example for such a regularizer is 
the total variation, 

TV(u) = / \\Du\\ , (7) 

where || • |j denotes the Frobenius norm, in this case on W^'^^ . Note that u may 
be discontinuous, so the gradient Du has to be understood in a distributional 



sense (Sect. 2.2 1. Much of the popularity of TV stems from the fact that it 
allows to include boundary-length terms: The total variation of the indicator 
function xs of ^ set S, 

Per(5) := TV(x5) ■ (8) 

called the perimeter of S, is just the classical length of the boundary dS . 

In this paper, we will in more generality consider ways to construct regular- 
izers which penalize interfaces between two adjacent regions with labels i ^ j 
according to the perimeter (i.e. length or area) of the interface weighted by an 
interaction potential d : {1, . . . , Z}^ — >■ M depending on the labels (in a slight 
abuse of notation the interaction potential is also denoted by d, since there is 
rarely any ambiguity with respect to the ambient space dimension). The sim- 
plest case is the Potts model with the uniform metric d{i,j) — iS i = j and 
otherwise d{i,j) — 1. In this case, the regularizer penalizes the total interface 
length, as seen above for the total variation. 

As a prime motivation for our work, consider the two-class case I — 2 with 
J = TV. As here the second component of u is given by the first via U2 = 1 — ui, 
we may pose the relaxed problem in the form 

u\x){si{x) - S2{x))dx 2TV(u') , (9) 



mm 

tt'GBV(n), 
u'(a;)e[0,l] for a.o. xeQ, 
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where u'{x) is a scalar. This formulation is also referred to as continuous cut in 
analogy to graph cut methods. It can be shown |CEN06| that while there may 
be non-discrete solutions of the relaxed problem, a discrete - i.e. u'(x) G {0, 1} 
- global optimal solution can be recovered from any solution of the relaxed 
problem. We can thus reduce the combinatorial problem to a convex problem. 
While there are reasons to believe that this procedure cannot be extended to 
the multi-class case, we may still hope for "nearly" discrete solutions. 

1.1 Related Work 

The difficulty of the labeling problem varies greatly with its precise definition. 
Formulations of the labeling problem can be categorized based on 

1. whether they tackle the binary (two-class) or the much harder multiclass 
problem, and 

2. whether they rely on a graph representation or arc formulated in a spatially 
continuous framework. 

An early analysis of a variant of the binary continuous cut problem and the 
associated dual maximal flow problem was done by Strang |Str83| . Chan 
et al. |CEN06) pointed out that by thresholding a nonbinary result of the re- 
laxed, convex problem at almost any threshold one can generate binary solutions 
of the original, combinatorial problem (this can be carried over to any thresh- 
old and to slightly more general regularizers [BerOQi IZNF09| . The proof heavily 
relies on the coarea formula [FR60] , which unfortunately does not transfer to 
the multiclass case. The binary continuous case is also closely related to the 
thoroughly analyzed Mumford-Shah jMS89j and Rudin-Osher-Fatemi (ROF) 
[RQF92) problems, where a quadratic data term is used. 

For the graph-based discretization, the binary case can be formulated as 
a minimum-cut problem on a grid graph, which allows to solve the problem 
exactly and efficiently for a large class of metrics using graph cuts |KB051lBK04| . 
Graph cut algorithms have become a working horse in many applications as 
they are very fast for medium sized problems. Unfortunately they offer hardly 
any potential for parallelization. As mentioned in the introduction, the graph 
representation invariably introduces a grid bias |Boy03| (Fig. [2]). While it 
is possible to reduce the resulting artifacts by using larger neighborhoods, or 
by moving higher-order potentials through factor graphs, this greatly increases 
graph size as well as processing time. 

Prominent methods to handle the graph-based multiclass case rely on finding 
a local minimum by solving a sequence of binary graph cuts |BVZ01j (see |KT07| 
for a recent generalization) . These methods have recently been extended to the 
continuous formulation [TPCBOS] with similar theoretical performance |Ols09| . 
Our results can be seen as a continuous analogon to |Ish03| , where it was shown 
that potentials of the form d{i,j) = |i — j\ can be exactly formulated as a cut 
on a multi-layered graph. An early analysis can be found in |KT99| . where the 
authors also derive suboptiniality bounds of a linear programming relaxation 
for metric distances. All these methods rely on the graph representation with 
pairwise potentials. 

In this paper, we will focus on the continuous multiclass setting with higher 
order potentials in the discretization. Closely related to our approach is |CCP08| . 
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Figure 2: Discretization schemes. Left to right: Graph-based with 4- and 8- 
neighborhood; higher order potentials. In graph-based approaches the regular- 
izer is discretized using terms depending on the labels of at most two neighboring 
points. This leads to artifacts as isotropic metrics are not approximated well. 
Using higher-order terms the discrete functional more closely approximates the 
continuous functional (Fig.[3|. 




Figure 3: Effect of the discretization on an inpainting problem. Left to right: 
Input image with unknown black region; graph-based method using pairwise 
potentials (a-/3-swap code from [SZS"'"06j ): proposed method using higher order 
potentials. Due to the introduced anisotropy, the graph-based method shows a 
bias towards axis-parallel edges. 

In contrast to approaches that rely on a linear ordering of the labels |Ish03[ 
IBT09) . the authors represent labels in a higher-dimensional space. In a certain 
sense, |LLT06j can be seen as a predecessor of this approach: in this work, the 
authors represent the label assignment using a piecewise constant real-valued 
function, but parametrize this function using a set of / polynomial basis func- 
tions, which enables them to employ the Potts regularizer. 

The approach of |CCP08| allows to formulate interaction potentials of the 
form d{i,j) — 7(|« — j\) with nondecreasing, positive, concave 7. The authors 
provide a thorough analysis of the continuous model and propose a relaxation 
based on the convex envelope, which gives almost discrete solutions in many 
cases. We will extend this approach to the more general class of regularizers 
where d is an arbitrary metric. The same authors proposed a "Fast Primal- 
Dual" algorithm with proven convergence to solve the associated saddle point 
problem |PCBC09aj . By lifting the objective to a higher dimensional space, it 
turns out that the same method can be used to solve many problems also with 
nonlinear data term |PCBC09b) . 

Our approach is a generalization of |ZGFN08l lLKY+09] . where a similar 
linearization is used with the regularizer restricted to the Potts distance, and 
with less strong convergence results. These methods have also been extended 
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to segmentation on manifolds |DFPH09] . 

Regarding optimization, several authors proposed smoothing of the primal 
or dual objective together with gradient descent (Bcr09, BYTlOj . In contrast, 
our approach does not require any a priori smoot hing. Usin g Nesterov's method 
[Nes04j for the labeling problem was proposed in [LKY+09] . An earlier analysis 
of the method in the context of £i-norm and TV minimization can be found 
in [WABFOT] ■ In |BBC09| the method is applied to a class of general ii regu- 
larized problems. In }GBO09j a predecessor of the proposed Douglas- Rachford 
approach was presented in the Split Bregman framework jSet09al and restricted 
to the two-class case. We provide an extension to the multi-class case, with 
proof of convergence and a sound stopping criterion. 



1.2 Contribution 

The paper is organized as follows: 

1. We formulate natural requirements on the regularizer J and show their 
implications on the choice of the interaction potential d (Sect. [3|. In par- 
ticular, d must necessarily be a metric under these requirements (Prop.[T]). 

2. Given such a metric, we study two possible approaches to extend regular- 
izers J on the relaxed set (Sect.|4]): 

• The "envelope" approach, which is a generalization of the method 



recently suggested by ChamboUe et al. (Sect. 4.3). While there is 
no simple closed form expression, we show that it can be used to 
construct a true extension for any metric d (Prop. [s]). 



• The "Euclidean distance" approach (Sect. 4.4), which yields exact 
extensions for Euclidean metrics d only but has a closed form ex- 
pression. We review some methods for the approximation of non- 
Euclidean metrics. 

We provide a unified continuous framework and show existence of a min- 
imizer. 

Both approaches lead to non-smooth convex problems, which can be stud- 
ied in a general saddle point formulation (Sect. [5]). Within this framework, 
we propose an improved binarization technique for nonstandard potentials 



to recover approximate solutions for the non-relaxed problem (Sect. 5.6) 



4. We provide and analyze two different methods that are capable of mini- 
mizing the specific class of saddle point problems (Sect. [6]): 

• A specialization of a method for nonsmooth optimization as sug- 



gested by Nesterov (Sect. 6.2 1. The method is virtually parameter- 
free and provides explicit a priori and a posteriori optimality bounds. 

A Douglas- Rachford splitting approach (Sect. [63] ). We show that the 
approach allows to compute a sequence of dual iterates that provide 
an optimality bound and stopping criterion in form of the primal-dual 
gap. 
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Both methods are highly parallelizable and are shown to converge. For 
reference, we also summarize the primal-dual technique from |PCBC09a] 
(Sect.[6l|). 



5. Finally, we illustrate and compare the above methods under varying con- 
ditions and demonstrate the applicability of the proposed approaches on 
real- world problems (Sect. [7]). 

In contrast to existing graph-based methods, we provide a continuous and 
isotropic formulation, while in comparison with existing continuous approaches, 
we provide a unified framework for arbitrary, non-uniform metrics d. The Eu- 
clidean metric method and Nesterov optimization have been announced in less 
generality in [LKY+091 iLBSnO] , 

2 Mathematical Preliminaries 

In the following sections we provide a reference of the notation used, and a brief 
introduction to the concept of functions of bounded variation and corresponding 
functionals. We aim to provide the reader with the basic ideas. For more 
detailed expositions we refer to [AFPOOi IZie89| . 

2.1 Notation 

In the following, superscripts denote a collection of vectors or matrix columns, 
while subscripts Vk denote vector components, i.e. we denote, for A £ M"^^', 

A = (ai|...|a') = (A„), A,j^{a^)^^al, 1 z rf, 1 ^ j L (10) 

An additional bracket v^-^^ indicates an element of a sequence («(*)). We will 
frequently make use of the Kronecker product |Gra81) 

. Jljnixmi ^ j^riaxms _^ ]]j(nin2 ) X (mi ma) ^^^-^ 

in order to formulate all results for arbitrary dimensions. The standard simplex 
in M' is denoted by A/ := {x e R^\x ^ 0,e^x = 1}, where e := (1, . . . , 1)^ e E'. 

is the identity matrix in M" and || • || the usual Euclidean norm for vectors 
resp. the Frobenius norm for matrices. Analogously, the standard inner product 
(•, •) extends to pairs of matrices as the sum over their elementwise product. 
Br{x) denotes the ball of radius r at x, and S'^~^ the set of a; S M'^ with 
||x|| = 1. The characteristic function xs(x) of a set S is defined as xs{x) = 1 
iS X €z S and xsi^) — otherwise. By Ss{x) ^ iS x £ S and 6s(x) — +oo 
otherwise we denote the corresponding indicator function. For a convex set 
C, crc(u) := sup^gc(u,?;) is the support function from convex analysis. J7(u) 
denotes the classical Jacobian of u. 

C^(r2) is the space of fc-timcs continuously diffcrentiable functions on ft with 
compact support, and Co(r2) the completion of C^(ri) under the supremum 
norm. As usual, denotes the d-dimensional Lebesgue measure, while 'H*'' 
denotes the /c-dimensional Hausdorff measure. For some measure fj, and set M, 
H^M denotes the restriction of n to M, i.e. {fj,^M){A) := n{M f] A). 
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2.2 Total Variation and BV 



The total variation will be our main tool to construct the regularizer J. For a 
difFerentiable scalar-valued function u, the total variation is simply the integral 
over the norm of its gradient: 

TV(u) = / \\\Iu\\dx. (12) 

As u is the designated labeling function, which ideally should be piecewise con- 
stant, the differentiability and continuity assumptions have to be dropped. In 
the following we will shortly review the general definition of the total variation 
and its properties. 

We require the image domain SI C to be a bounded open domain with 
compact Lipschitz boundary, that is the boundary can locally be represented as 
the graph of a Lipschitz-continuous function. For simplicity, we will assume in 
the following that = (0, 1)''. 

We consider general vector- valued functions u — (ui, . . . , u;) : il — > M' which 
are locally absolutely integrable, i.e. u £ Ll^^{iiy . As 17 is bounded this is 
equivalent to being absolutely integrable, i.e. u € L^{fiy. For any such function 
u, its total variation TV{u) is defined in a dual way [AFPOOl (3.4)] as 



TV(w) :— sup y / Ujdivv^dx— sup / {u,T)ivv)dx , 



(13) 



-.TV 



, T 



Divw :— (divu^, . . . ,divti' 
This definition can be derived for continuously differentiable u by extending 



( 12 1 to vector- valued u, 

TV(u) = / \\J{u)\\dx, (15) 



n 



replacing the norm by its dual formulation and partial integration. If u has 
finite total variation, i.e. TV(u) < cxd, u is said to be of bounded variation. The 
vector space of all such functions is denoted by BV(il)': 

BV{ny = {ue (ii(17))'|TV(M) <c5o} . (16) 

Equivalently, u £ BV(il)' iff w G L^(ri)' and its distributional derivative corre- 
sponds to a finite Radon measure; i.e. uj e L^(f2) and there exist M''-valued 
measures Duj = {DiUj, . . . ^D^Uj) on the Borel subsets B{Vl) of Vl such that 
[AFPOOl p.118] 

E ■=! L ^3 div vHx = - E'.i Eti /a , £ {Cfm"^' ■ (17) 

These measures form the distributional gradient Du = {pu\ \ . . . \Dui), which is 
again a measure that vanishes on any H'-'^^ ^■'-negligible set. If u G BV(r2) then 
|DM|(r2) = TV(u), where \Du\ is the total variation of the measure Du in the 
measure-theoretic sense [AFPOOl 3.6]. The total variation of characteristic func- 
tions has an intuitive geometrical interpretation: For a Lebesgue-measurable 
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subset 5 C M'', its perimeter is defined as the total variation of its characteris- 
tic function, 

Per(5) := TV(x5) • (18) 

Assuming the boundary dS is sufficiently regular, Per(5) is just the classical 
length {d = 2) or area {d = 3) of the boundary. 

2.3 Properties of TV and Compactness 

We review the most important ingredients for proving existence of minimizers 
for variational problems on involving BV involving TV. 

Convexity. As TV is the pointwise supremum of a family of linear functions, it 
is convex and positively homogeneous, i.e. TV(aw) — aTV(u) for a > 0. 
Lower Semicontinuity. A functional J is said to be lower semicontinuous with 
respect to some topology, if for any sequence (w'*^') converging to m, 

lim inf J(u('''') ^ J{u). (19) 

It can be shown that for fixed il, the total variation TV is well-defined on 
-^ioc(^)' ^"^^ lower semicontinuous in BV(r2)' w.r.t. the L\^^{^iy topology 
[AFPOOl 3.5,3.6]; hence also in L^{Vlf due to the boundedness of il. 
Compactness. For BV, instead of the norm topology induced by 

||u||bv := / \\u\\dx + T:Y{u), (20) 
Jn 

which makes BV(ri)' a Banach space but is often too strong, one frequently uses 
the weak* convergence: Define u^^'^ — ^ u weakly* iff 

1. u,u^^^ e BV(r2)' Vfc e N, 

2. wC^) ^ u in L^{VL) and 

3. Du^^^ — ?> Du weakly* in measure, i.e. 

VueCo(r2): lim / v dDu^'''> = [ vdDu. (21) 

For u,u^'^^ G BV(r2)' this is equivalent to -u^*^^ — > u in i^(fi)', and {u^''^) being 
uniformly bounded in BV(ll)', i.e. there exists a constant C < oo such that 
||w^''^||bv ^ C'Vfc € N jAFPOOl 3.13]. For the weak* topology in BV, a com- 
pactness result holds |AFPOO[ 3.23]: If (u^^)) C BV(rj)' and (uC^)) is uniformly 
bounded in BV(r2)', then (u^*^^) contains a subsequence weakly*-converging to 
some u e BY{ny. 

2.4 General Functionals on BV 

We will now review how general functionals depending on the distributional 
gradient Du can be defined. Recall that for any u G BV(r2)' the distributional 
gradient Du is a measure. Moreover, it can be uniquely decomposed into three 
mutually singular measures 

Du = D^u + D^u + D^u, (22) 
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that is: An absolutely continuous part D"", the jump part , and the Cantor 
part D'^. Mutual singularity refers to the fact that can be partitioned into 
three subsets, such that each of the measures is concentrated on exactly one of 
the sets. We will give a short intuitive explanation, see }AFP001 3.91] for the 
full definitions. 

The D° part is absolutely continuous with respect to the d-dimensional 
Lebesgue measure C^, i.e. it vanishes on any /^''-negligible set. It captures the 
"smooth" variations of u: in any neighborhood where u has a (possibly weak) 
Jacobian J'{u), the jump and Cantor parts vanish and 

Du = D^u = J{u)C'^ . (23) 

The jump part is concentrated on the set of points where locally u jumps 
between two values and u"*" along a (c? — l)-dimensional hypersurface with 
normal i^u £ S'^^^ (unique up to a change of sign). In fact, there exists a 
jump set J„ of discontinuities of u and Borel functions , u~ : J,j — ^ and 
Vu-Ju^ 5"^"^ such that [AFPOO, 3.78, 3.90] 

D^u = Du^Ju = Vu{u+ - u-yW^-^^Ju , (24) 

where ?^''~^lJ„ denotes the restriction of the {d — l)-dimensional Hausdorff 
measure on the jump set Ju, i.e. ('H^~^l Jti)(A) = 'H'^^^{Ju n A) for measurable 
sets A. The Cantor part captures anything that is left. 

As an important consequence of the mutual singularity, the total variation 
decomposes into \Du\ — \D°-u\ + \D^u\ + \D'^u\. Using this idea, one can define 
functional depending on the distributional gradient Du |AFPOO[ Prop. 2.34]. 
For u € BV(f2)' and some convex, lower semi-continuous : M'^^' — > M, define 

J{u} := / *(Du) / ^{J{u){x))dx + . . . 
Jn Jn 



{uu{x) {u+{x) - M"(x))^) dW^- 



Here ^'oo is the recession function ^'00(2;) — limt_j.oo '^^^j^ of and D'^ul\D'^u\ 
denotes the polar decomposition of D'^u, which is the density of D'^u with respect 
to its total variation measure jZ^'^uj. If \1/ is positively homogeneous, = \E' 
holds, and 



From (251 it becomes clear that the meaning of ^I^ acting on the Jacobian of u 
is extended to the jump set as acting on the difference of the left and right side 
limits of u at the discontinuity. This is a key point: by switching to the measure 
formulation, one can handle noncontinuous functions as well. 
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3 Necessary Properties of the Interaction Po- 
tential 



Before applying the methods above to the labehng problem, we start with some 
basic considerations about the regularizer and the interaction potential d. We 
begin by formalizing the requirements on the regularizer of the relaxed prob- 
lem as mentioned in the introduction. Let us assume we are given a general 
interaction potential d : {1, . . . , Z}^ — ^ M. Intuitively, d{i,j) assigns a weight to 
switching between label i and label j. We require 

> 0,t^j, (27) 

but no other metric properties (i.e. symmetry or triangle inequality) for now. 
Within this work, we postulate that the regularizer should satisfy 

(PI) J is convex and positively homogeneous on BV(51)'. 

(P2) J{u) — for any constant u, i.e. there is no penalty for constant labelings. 

(P3) For any partition of O into two sets S,S^ with Per(S') < oo, and any 
hj e {!,...,/}, 

J{e\s + e'Xs^) - d{z,j)Pey{S), (28) 

i.e. a change from label i to label j gets penalized proportional to d{i,j) 
as well as the perimeter of the interface. Note that this implies that J is 
isotropic (i.e. rotationally invariant). 

We require convexity in (PI) in order to render global optimization tractable. 
Indeed, if J is convex, the overall objective function ([6| will be convex as well 
due to the linearization of the data term. Positive homogeneity is included as 
it allows J to be represented as a support function (i.e. its convex conjugate is 
an indicator function and J = <tx' for some closed convex set P), which will be 
exploited by our optimization methods. 

Requirements (P3) and (P2) formalize the principle that the multilabeling 
problem should reduce to the classical continuous cut ^ in the two-class case. 
This allows to include boundary length-based terms in the regularizer that can 
additionally be weighted depending on the labels of the adjoining region (Fig.|4]). 
Together, these requirements pose a natural restriction on d: 

Proposition 1. Let {J,d) satisfy (PI) - (P3). Then d must necessarily he a 
metric, i.e. for all i,j, k G {1, . . . ,1}, 

1. d{i,i) = 0, 

2. d{ij)^d{j,i), yi^j, 

3. d is subadditive: d{i, k) ^ d{i,j) -f d(j, k). 

Proof. 1. follows from (P2) and (P3) by choosing i — j and S with Per(S') > 0. 
Symmetry in 2. is obtained from (P3) by replacing S with , as Per(5) — 
Per(5''). To show 3., first note that J{u) — 2J(u/2 + c/2) for any constant 
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Figure 4: Effect of choosing different interaction potentials. Top row: Tlie 
original image (left) is segmented into 12 regions corresponding to prototypical 
colors vectors. The Potts interaction potential penalizes the boundary length 
independently of the labels (right), which leads to a uniformly smooth segmen- 
tation. Bottom row: By modifying the interaction potential, the regulariza- 
tion strength is selectively adjusted to suppress background (left) or foreground 
(right) structures while allowing for fine details in the other regions. 



c e andaUw e BV(f7)', since J{u) = J(u+c-c) ^ 2 J((m+c)/2) + J(-c/2) = 
2J(w/2 + c/2) ^ J{u) + J{c) = J{u). Fix any set S with perimeter 



c := 



Per(S') > . 

Then, using the above mentioned fact and the positive homogeneity of J, 



cd{i, k) 



1 



(29) 

(30) 
(31) 

(32) 

(33) 
(34) 

□ 



Note that if the requirement \27\ is dropped, it is easy to show that if 
d{i,j) = for some i ^ j, then d{i,k) = d{j,k) for any k. In this case the 
classes i and j can be collapsed into a single class as far as the regularizer is 
concerned. The decision between label i and j is then completely local, i.e. 
depends only on the data term and can be postponed to a post-processing step 
by modifying the data term to 



c{d{i,])+d{],k)) . 



l{Si{x),Sj{x)} . 



(35) 



Thus (27) is not a real limitation and can be always assured. As a side note. 



it can be shown that, under some assumptions and in the space of piecewise 
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constant functions, the subadditivity of d already follows if J is required to be 
lower semicontinuous |Bra021 p. 88]. 

Proposition [l] implies that for non-metric d, we generally cannot expect to 
find a regularizer satisfying (P1)-(P3). Note that here J is not required to be 
of any particular form. In the following sections, we will show that on the other 
hand, if d is metric as in Proposition [TJ such a regularizer can always be con- 
structed. This implies that the interaction potentials allowing for a regularizer 
that satisfies (P1)-(P3) are exactly the metrics. 

4 Constructing Regularizers from the Interac- 
tion Potential 

We study next how to construct regularizers on BV(fi)' satisfying (P1)-(P3). 



As in ( 25 1 we set 

J{u) := I ^{Du) . (36) 



We additionally require ^' : M'^^' — > M^p to be a support function, i.e. for some 
closed, convex ^ X>ioc C M'*^^', 

-^{z) = a-Di,M = sup {z,v{x)). (37) 
As a support function, ^ coincides with its recession function ^oo, thus 
J{u) = / 'i'{J{u){x))dx + ... 

* {u+{x) - u-{x)y'^ dW^-^ + . . . 



n^A<i\D^u\. (38) 



Also, we have an equivalent dual formulation in analogy to (13), 



J{u) ^ sup{ / {u,V>wv)dx\v e C^;°{n f^ , v{x) e PiocVa; G il} . (39) 
Jn 

For simplicity we will also assume that I?ioc is rotationally invariant along the 
image dimensions, i.e. for any rotation matrix R G M''^'', 

w = (w\...,t;') GPioc ^ (i?w\...,W) ePioc. (40) 

This is equivalent to J being isotropic. 

We will now show under which circumstances a minimizer exists in BV(rJ)', 
and then see how the approach can be used to construct regularizers for specific 
interaction potentials. 

4.1 Existence of Minimizers 

The complete problem considered here is of the form (cf. ([6| and (38)) 



inf/(M), fiu):^ {u,s)dx + J{u) (41) 



n 



13 



where J{u) — J^'^{Du) as in (361, 

*(z) = sup {z,v{x)) (42) 

for some closed convex X'joc ^ 0, and 

C = {ue BY{ny\u{x) e A, ^''-a.e. x e n} . (43) 



Note that / is convex, as it is the pointwise supremum of affine functions ( 39 1 . 
Again for simpUcity we set fl — (0,1)'^ . Then we have the following 

Proposition 2. Let 2?ioc ^ 9 be closed convex, — <tvi„^, s E L^(il)', and 

f{u)^ I {u,s)dx+ I ^{Du). (44) 
Jn Jn 

Additionally assume that I?ioc ^ I^Pui^) — I^''^' foi" some < p^. Then f is 
lower semicontinuous in BV(51)' with respect to convergence. 

Proof. As the data term is continuous, it suffices to show that the regularizer 
is lower semicontinuous. This is an application of |AFPOO[ Thm. 5.47]. In fact, 
the theorem shows that / is the relaxation of / : C^(ri)' — >■ M, 

f[u):= I {u,s}dx+ I •^{Ju{x))dx, (45) 
Jn Jn 

on BV(f2)' w.r.t. Lj^^ (thus here L^) convergence and thus lower semicontinuous 
in BV(il)'. To apply the theorem, we have to show that 5* is quasiconvex in 
the sense of 'AFPOO' 5.25], which holds as it is convex by construction. The 
other precondition is (at most) linear growth of which holds with ^ ^'(x) ^ 
pjx\\. □ 

Proposition 3. Let f, 5*, s as in Prop. and additionally assume that 

BpMri{{v\...,v')\Y,v' C 6p„ (0) (46) 

i 

for some < pi ^ p„. Then the problem 



min/(u) (47) 



has at least one minimizer. 



Proof. From the inner and outer bounds it holds that pi\\z\\ < ^{z) < Pu\\A\ for 
any z = {z^\ . . .\z^) £ W^^^ with z^ + . . . + z^ =0. Moreover, the constraint u G C 
implies that Du = {Dui \ . . . \Dui) satisfies Dui + . . . + Dui = 0. Combining 
this with the positive homogeneity of it follows from (26) that 

s$ TV(u) < J{u) < pu TV(u) . (48) 

From 

u,s)dx^~ ||u(a;)|loo|ls(a;)||ida;, (49) 
n Jn 



14 



the fact that s S L^{fiy, and boundcdncss of fl and A;, it follows that the data 
term is bounded from below on C. 

We now show coercivity of / with respect to the BV norm: Let (u*^'"'-') C C 
with IIm^'^^IIi + TV(m^'^^) — > oo. As the data term is bounded from below, and 
using the fact that J(u('')) ^ pi TV(w('='), it follows that /(u^^)) +00. Thus 
/ is coercive. 



Equations ( [48^ and ( |49[ ) also show that / is bounded from below. Thus we 
can choose a minimizing sequence (it'-'^-*). Due to the coercivity, the sequence 
_l_ TV(u('^^) must then be bounded from above. From this and |AFP001 
Thm. 3.23] we conclude that there is a subsequence of (u^'"'^) weakly*- (and 
thus L^-) converging to some u G BV(r2)'. With the lower semicontinuity from 
Prop. [2] and closedness of C with respect to convergence, existence of a 
minimizcr follows. □ 

4.2 Relation to the Interaction Potential 

To relate such J to the labeling problem in view of (P3), we have the following 

Proposition 4. Let ^E* — <tx>i^^ and J{u) = J^^ 'i'{Du) as in Prop, pi For some 
u' e BV(r2) and vectors a,b G Ai, let u{x) = (1 — u'{x))a + u'{x)o. Then for 
any vector y G M'' with \\y\\ = 1, 

J{u) = *(y(5-a)^)TV(u') = ( sup ||w(6-a)|| ) TV(u'). (50) 

In particular, if'^ {y{e^ — 6-')^) = d{i,j) for some y with = 1, then J fulfills 
(PS). 



Proof. To show the first equality, ( 26 ) implies 



Ji-) = / * ( ^) (51) 



D{a + u'{b- a))\ 



We make use of the property \{Du'){b — a)^\ = \Du'\\\b — a\\, which is a direct 
consequence of the definition of the total variation measure and the fact that 
11^(6 — a)^|| = ||w|| ||6 — a\\ for any vector w G R'^ (note that a, 6 G are also 
vectors). Therefore 

which by positive homogeneity of implies 

Since the density function Du'/\Du'\ assumes values in S'^~^, there exists, for 
a.e. a; G ri, a rotation matrix mapping (Du' /\Du'\){x) to y. Together with the 
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rotational invariance of I?ioc from ( |40[ ) this implies 

J{u) - / ^{y{h-a)^)\Du'\^-^{y{b-ay)TY{u'), (56) 



which proves the first equality in ( |50[ ). The second equality can be seen as 
follows: 

r :— sup — a)|| (57) 

= sup sup {z,v{b~a)) (58) 
I'Sfioc 2eM<*,||z|Ki 

= sup sup {z,v{b~'a)). (59) 

Denote by Rz a rotation matrix mapping z to y, i.e. RzZ — y, then 

r = sup sup {RzZ, Rzv{b — a)) (60) 

= sup sup {y,v'{b — a)). (61) 

2eR'',||2Ki v'eF/.^v,ac 

The rotational invariance of I?ioc provides Rz'Dioc — 2?ioc, therefore 

r = sup sup {y,v{b—a)) (62) 

zeK'',||z||^i f ex'ioc 

= sup (y,i;(&-a)) =vl,(y(5_a)T). (53) 

□ 

As a consequence, if the relaxed multiclass formulation is restricted to two 
classes by parametrizing u = (1 — u')a + u'b for u'{x) E [0,1], it essentially 
reduces to the scalar continuous cut problem: Solving 



mm 

M'eBV(O), 

M'(a;)G[0,l],£''-a.G. xeil 



{{l-u')a + u'b,s)dx + J{u) (64) 



is equivalent to solving 

min / u' {b - a)dx + ^{y{b ~ a)^)TY{u') (65) 

M'eBV(n), 

ti'(£i;)e[0,l],£''-a.c. xen 

which is just the classical binary continuous cut approach with data (6 — a) and 
regularizer weight ^{y{b — a)^), where y E M'^ is some arbitrary unit vector. 
For the multiclass case, assume that 

u ~ Up = e^xpi + . . . + e'xp' (66) 

for some partition U . . . U P' =0 with Per(P*) < 00, i — 1, . . . ,1. Then the 
absolutely continuous and Cantor parts vanish [AFPOOl Thm. 3.59, Thm. 3.84, 
Rem. 4.22], and only the jump part remains: 

J{up) = f ^ {iyuAup+ - up-)^^ dn''-\ (67) 
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f 




Figure 5: Illustration of the set I?ioc used to build the regularizer for the uniform 
distance, d{i,j) = iff i = j and d{i,j) = 1 otherwise, for / = 3 and d = 1, 
i.e. in a one-dimensional space. Shown is a cut through the + + z'^ = 
plane; the labels correspond to the points e* — e' with i ^ j- The local 
envelope method leads to a larger set I?ioc (dashed) than the Euclidean metric 
method (solid). This improves the tightness of the relaxation, but requires more 
expensive projection steps. 



where Sup — Ui=i ; '^^^ union of the interfaces between regions. Define 

i{x),j{x) such that upj^{x) = e**^^^ and up^{x) = e-''^'. Then 



J {up) = ^ vl,(^j.„^ (e'(-)-e^"(-))^')dH'^-i. (68) 
By rotational invariance. 



J {up) = J ^(^y (e'(") - eJ(")) dH^-^ 



(69) 



for some y with ||y|| = 1. Thus the regularizer locally penalizes jumps between 
labels i and j along an interface with the interface length, multiplied by the 
factor ^(j/(e' — e^)^) depending on the labels of the adjoining regions. 

The question is how to choose the set X>ioc such that ^(^(e* — e^ )^) = d{i, j) 
for a given interaction potential d. We will consider two approaches which 
differ with respect to expressiveness and simplicity of use: In the local envelope 



approach (Sect. 4.3), Pioc is chosen as large as possible. In turn, J is as large as 



possible with the integral formulation (36 1. This prevents introducing artificial 



minima generated by the relaxation, and potentially keeps minimizers of the 
relaxed problem close to minimizers of the discrete problem. However, 2?ioc 
is only implicitly defined, which complicates optimization. In contrast, in the 



embedding approach (Sect. 4.4 1, Vioc is simpler at the cost of being able to 
represent only a subset of all metric potentials d. For an illustration of the two 
approaches, see Fig. [5j 
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4.3 Local Envelope Method 

Chambolle et al. |CCP08| proposed an interesting approach for potentials d of 
the form d{i,j) = 7(|z — j|) for a positive concave function 7. The approach is 
derived by specifying the value of J on discrete u only and then constructing 
an approximation of the convex envelope by pulling the convexification into the 
integral. This potentially generates tight extensions and thus one may hope that 
the convexification process does not generate too many artificial non-discrete 
solutions. 

We propose to extend this approach to arbitrary metric d by setting (Fig. [S]) 
■■= f]{v^{v\...,v')eR'''''\\y-v^\^d{t,j),Y,v''=0}{70) 

for some given interface potential d{i,j). By definition 'Df^^ is rotationally in- 
variant, and by the considerations in Sect. [3] we assume d to be a metric. Then 
the inner and outer bound assumptions required for existence of a minimizer in 
Prop. [3] are satisfied. Moreover, d can be reconstructed from Vf- 

Proposition 5. Let d : {1, . . . , l}^ — > M^o be a metric. Then for any i,j, 

sup ((^^Oi-Mi) - dit,j). (71) 



Proof. follows from the definition (70). can be shown using a network 
flow argument: We have 

sup {{v^)^-{v^)^) (72) 
^ supfe -pj I p £ R\J2Pk = Oyi',f ■■P^' -Pf ^ d{i',f)} (73) 

k 

'-^ supfe -p, I p e : p,, -py < d{i',j')} (74) 



d{i,j). (75) 



Equality (*) holds since each p in the set in (74) can be associated with p 
p— J X]fe Pfc I which is contained in the set in ( 73 ) and satisfies and Pi—pj = Pi—pj ■ 



The last equality (**) follows from [Mur03. 5.1] with the notation ^ — d (and 
7 = d, since d is a metric and therefore the triangle inequality implies that the 
length of the shortest path from i to j is always d{i,j)). □ 

The final result of this section is the following: 

Proposition 6. Let d : M'^' M^o be a metric. Define 2?ioc ■— T^f^^ as in {70), 

'■= '^Vf '^^d 

Jd := / ^d{.Du) (76) 
Jn 

as in Then Jd satisfies (P1)-(P3). 

Proof. (PI) and (P2) are clear from the definition of Jd- (P3) follows directly 
from Prop. [5] and Prop. |4]with y — . □ 
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Defining Vf^^ as in ( 70 ) provides us with a way to extend the desired regu- 



larizer for any metric d to non-discrete u G C via ( 38 ) . The price to pay is that 
there is no simple closed expression for and thus for J^, which potentially 
complicates optimization. Note that in order to define "Df^^, d does not have to 
be a metric. However Prop. [5] then does not hold in general, so J is not a true 
extension of the desired regularizer, although it still provides a lower bound. 

4.4 Euclidean Metric Method 

In this section, we consider a regularizer which is less powerful but more efficient 



to evaluate. The classical total variation for vector- valued u as defined in ( 13 ) 
is 

TV(u) = [ \\Du\\ . (77) 
Jn 

This classical definition has also been used in color denoising and is also referred 
to as MTV |SR96linSV08] . We propose to extend this definition by choosing 
an embedding matrix A € M*^^' for some k ^ I, and defining 

Ja{u) TY{Au). (78) 

This corresponds to substituting the Frobcnius matrix norm on the distribu- 



tional gradient with a linearly weighted variant. In the framework of (38), it 
amounts to setting Vioc = (cf. Fig. ^ with 



Vt KA|i;'eM^xM|«'|| < l} = Si(0)A (79) 

Clearly, G V{^^ and 

^'(z) — (jj,A [z) ~ sup {z^v') = sup {z,vA) (80) 
'™ u'eBi(o)A t)eBi(o) 

sup {zA^ ,v) = \\zA^\\ . (81) 
i;eei(o) 

In particular, we formally have 

■^{Du) = \\{Du)A'^\\ = \\D{Au)\\, (82) 
as w I— >■ Du is linear in u. To clarify the definition, we may rewrite this to 

TYa{u) = J^^^\\Diu\\\ + ... + \\Dau\\\, (83) 

where \\v\\a '■— {v'^ A'^ AvY^^ . In this sense, the approach can be understood as 
replacing the Euclidean norm by a linearly weighted variant. 

It remains to show for which interaction potentials d assumption (P3) can 
be satisfied. The next proposition shows that this is possible for the class of 
Euclidean metrics. 

Proposition 7. Let d be an Euclidean metric, i.e. there exist k Cz N as well 
as a^,.. . ,a^ G M'^ such that d{i,j) = ||a* — a^\\. Then for A = [a^\ . . . |a'), 
J A ■■= TVa satisfies (P1)~(P3). 
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Proof. (PI) and (P2) are clearly satisfied. In order to show (P3) we apply 
Prop. |4] and assume \\y\\ = 1 to obtain with (81) 



*(y(e^-e^y) - \\y{e' - e^)'^ A' \\ (84) 
= ||2/(a'-a^y|| -||a'-a^"||. 

□ 

The class of Euclidean metrics comprises some important special cases: 

• The uniform, discrete or Potts metric as also considered in |ZGFN08l 
|LKY+09| and as a special case in jKT991 IKTOT] . Here d{ij) = iff i = j 
and d{i,j) = 1 in any other case, which corresponds to ^ = {1/^/2)1. 

• The linear (label) metric, d(i,j) — c\i — j\, with A ~ (c, 2c, . . . , Ic). This 
regularizer is suitable to problems where the labels can be naturally or- 
dered, e.g. depth from stereo or grayscale image denoising. 

• More generally, if label i corresponds to a prototypical vector in k- 
dimensional feature space, and the Euclidean norm is an appropriate met- 
ric on the features, it is natural to set d{i,j) — — z^\\, which is Eu- 
clidean by construction. This corresponds to a regularization in feature 
space, rather than in "label space" . 

Note that the boundedness assumption involving pi required for the existence 
proof (Prop. [3| is only fulfilled if the kernel of A is sufficiently small, i.e. 
ker A C {te\t G E}, with e = (1, . . . , 1)^ G M': Otherwise, V{^^ = (^i(O)^) n 
{{v^, . . . , w')l J2i = 0} is contained in a subspace of at most dimension {l — 2)d, 



and (46) cannot be satisfied for any pi > 0. Thus if d is a degenerate Euclidean 
metric which can be represented by an embedding into a lower-dimensional 
space, as is the case with the linear metric, it has to be regularized for the ex- 
istence result in Prop. [3] to hold. This can for example be achieved by choosing 
an orthogonal basis {b^, . . . jV) of ker A, where j = dimker A, and substituting 
A with A' := {A^ ,eb^, . . . , sV)^ , enlarging k as required. However these obser- 
vations are mostly of theoretical interest, as the existence of minimizers for the 
discretized problem follows already from compactness of the (finite-dimensional) 
discretized constraint set. 

Non-Euclidean d, such as the truncated linear metric, d{i,j) — min{2, 
cannot be represented exactly by TV^. In the following we will demonstrate 
how to construct approximations for these cases. 

Assume that d is an arbitrary metric with squared matrix representation 
D G i.e. = d{i,jf. Then it is known jBG05) that d is Euclidean iff for 

C := /— fee^, the matrix T := —\CDC is positive semidefinite. In this case, d 
is called Euclidean distance matrix, and A can be found by factorizing T = AJ A. 
If T is not positive semidefinite, setting the nonnegative eigenvalues in T to zero 
yields an Euclidean approximation. This method is known as classical scaling 
[BG05j and does not necessarily give good absolute error bounds. 

More generally, for some non-metric, nonnegative d, we can formulate the 
problem of finding the "closest" Euclidean distance matrix E as the minimiza- 
tion problem of a matrix norm \\E — D\\m over all E E £i, the set oi I x I Eu- 
clidean distance matrices. Fortunately, there is a linear bijection B : Vi-i £i 
between £i and the space of positive semidefinite (Z — 1) x (Z — 1) matrices Vi-i 
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Figure 6: Euclidean embeddings into M.^ for several interaction potentials with 
four classes. Left to right: Potts; linear metric; non-Euclidean truncated linear 
metric. The vertices correspond to the columns a^, . . . , a' of the embedding ma- 
trix A. For the truncated linear metric an optimal approximate embedding was 
computed as outlined in Sect. 4.4 with the matrix norm :— maxj 



M J l^ij I 



[Gow85[ IJT95j . This allows us to rewrite our problem as a semidefinite program 
[WSVnni (p.534-541)] 

inf \\B{S)-D\\m. (85) 



Problem (85) can be solved using available SDP solvers. Then E 

},CEC 



and A can be extracted by factorizing 



B{S) e Eu 
Since both E and D are 



explicitly known, ee '■= max. 



1/2 



1/2 I 



can be explicitly computed 



and provides an a posteriori bound on the maximum distance error. Fig.|6]shows 
a visualization of some embeddings for a four-class problem. In many cases, in 
particular when the number of classes is large, the Euclidean embedding provides 
a good approximation for non-Euclidean metrics (Fig. [7| . 

Based on the embedding matrices computed in this way, the Euclidean dis- 
tance approach can be used to solve approximations of the labeling problem 
with arbitrary metric interaction potentials, with the advantage of having a 
closed expression for the regularizer. 



5 Discretized Problem 



5.1 Saddle Point Formulation 



We now turn to solving the discretization of the relaxed problem (41 ). In order 



to formulate generic algorithms, we study the bilinear saddle point problem. 



min max g (u, v) , g{u,v) := (m, s) + {Lu, v) — (6, v) 



(86) 



As will be shown in Sect. 5.2 5.4 this covers both Jd (76) and Ja (78) as well 
as many other - even non-uniform and non-isotropic - regularizers. 

In a slight abuse of notation, we will denote by w, s G M" also the discretiza- 
tions of u and s on a uniform grid. Furthermore we require a linear operator 
L e M™^", a vector b E M™ for some m,n E N, and bounded closed convex 
sets C C C E™. Intuitively, L discretizes the gradient operator and T> 

corresponds to I?ioc, i-e- specifies \1/ in a dual formulation. The primal and dual 
objectives are 



fin) 



max 5 (it, v) 



and 



foiv) := min g{u,v) , 



(87) 
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d(i,l) 




Figure 7: Euclidean approximation of the Non-Euclidean truncated linear met- 
ric with interaction potential d{i,j) = v^/Sminjli — j'|,16}. Left to Right: 
Original potential for 64 classes; potential after Euclidean approximation. Bot- 
tom: cross section of the original (dashed) and approximated (solid) metric at 
i = 1. The approximation was computed using semidefinite programming as 



outlined in Sect. 4.4 It represents the closest Euclidean metric with respect to 
the matrix norm ||X — y ||m X)i j ~ The maximal error with respect 
to the original potential is = 0.2720. 



respectively. The dual problem then consists of maximizing /d(w) over V. As 
C and V are bounded, it follows from jRoc70[ Cor. 37.6.2] that a saddle point 
{u*,v*) of g exists. With |Roc70| Lemma 36.2], this implies strong duality, i.e. 

min/(u) = /(«*)= 9iu*,v*) = foiv*) = max/z5(^;) . (88) 

In our case, 0,7) exhibit a specific product structure, which allows to compute 
fn as well as the orthogonal projections lie and li-D efficiently, a fact that 
will prove important in the algorithmic part. The evaluation of the primal 
objective / on the other hand can be more difficult, depending on the definition 
of T>\oc resp. 2?, but is not required by the optimizer. Note that in the discrete 
framework, we may easily substitute non-homogeneous, spatially varying or even 
nonlocal regularizers by choosing L and h appropriately. 
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Algorithm 1 Projection of y £ onto the standard simplex A; 

2: repeat 

(k+i) f 0' * e ^^''^ ' 

4: ZC^+I) ^ e {1, . . . , l}\i e ZC^) or < 0}. 

5: yf+^Vmax{#+^\0} (^ = l,...,/). 
6: k ^ k + 1. 

7: until yC^+i) ^ 0. 



5.2 Discretization 

We discretize by a regular grid, f2 = {!,..., rii} x • • • x {!,..., n^} C M"^ 
for d G N, consisting of n := pixels in lexicographical order. 

We represent u by its values u = {u^ \ . . .\u'') S E"^' at these points, and 
denote by = 1, . . . , n, the j-th row of u, corresponding to u{x^). 

The multidimensional image space M"^' is equipped with the Euclidean inner 
product (•, •) over the vectorized elements. We identify u G M"^' with the vector 
in M"' obtained by concatenating the columns. 

Let grad :— (grad^l . • . |gradj)^ be the d-dimensional forward differences 
gradient operator for Neumann boundary conditions. Then div :— — grad^ is 
the backward differences divergence operator for Dirichlet boundary conditions. 
These operators extend to M"'^' via Grad := {Ii (E) grad), Div := (// ig) div). We 
define, for some k ^ I as will be specified below, the convex sets 

C:={MeM"^V"' e A,,j f,...,n}, (89) 
V -.^ YiVioc'^R"'''^'"' ■ (90) 

The set I?ioc and the operator L depend on the chosen regularizer. Note that 
for L Grad, k = I and 

Pzoe := ■.^{p=ip'\... b') € M'^^^'IIIpII ^1} , (91) 

the primal objective in the saddle point formulation discretizes the classical 
vector-valued total variation, 

n 

TV{u) aDivp(u) = c7i,(Gradu) = ^ \\G,,u\\ , (92) 

where each of the G^-j is an (Id) x {nl) matrix composed of rows of (Grad) such 
that G^D u constitutes the vectorized discrete Jacobian of u at the point . 

Projections on the set C are highly separable and can be computed exactly 
in a finite number of steps |Mic86l Alg. 4], see Alg. [l] for a summary of the 
pointwise operation. The dual objective is 

f^{v) ^- (5, v) + min(u, L'^v + s) . (93) 

Since the minimum decouples spatially, and miuygA; (y, -z) = vecmin(z) := 
mini Zi for all z £ MJ, the dual objective can always be evaluated by summing. 
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over all points x E ft, the minima over the components of L^v + s corresponding 
to each point, denoted by {L^v + s)^j , 

n 

/i?M = -(6,v)+^vecmin((L^i; + s),.) . (94) 

This fact is helpful if evaluating the primal objective is costly because the dual 
set I?ioc has a complicated structure. 

5.3 Specialization for the Local Envelope Method 

For a metric interaction potential d : {1, . . . , — >■ K^qj we set k := I and 

L := Grad , 

2?ioc := Vt,^f]{v^{v'\...\v')eR''^'\\\v^-v=\\^d{z,j)} (95) 



as in (70). We arrive at the saddle point form (86 1 with C, I?, and L defined as 
above, m — nl and 6 = 0. However due to the generality of the regularizer (cf. 
Prop. [6]), the primal objective / cannot be easily computed anymore (for the 
special case of three labels there is a derivation in |CCP08| ). Projections on T) 
can be computed for all a; £ f2 in parallel, while projections on the individual 
sets Pf^j. can be computed by the Dykstra algorithm, cf. [CCPOSj and Sect. 6.4 



5.4 Specialization for the Euclidean Metric Method 

For A G M*^^' as in (|78|, the Euclidean regularizer can be obtained by setting 



L := (Grad)(A(8)/„), 



loc 



:= VL = {v = {v'\.-.\v')eR''''''\\\v\\^l} 



(96) 



Departing from the definition (79 1 of 2?f^j., A is merged into L, as the optimiza- 
tion method will rely on projections on 2?ioc- Including A into 2?ioc, i-e. by 
setting 



(A®/„)T2?L 



(97) 



and L := Grad, would prevent computing the projection in closed form. Project- 
ing on the unit ball "D^^^ on the other hand is trivial. The discretized regularizer 
can be explicitly evaluated, since 



*(z) = \\zA^ 



Comparison to ( 92 1 yields 



JAiu) = TV((A®/„)u). 



(98) 



(99) 



We finally arrive at the form (86) with C, V, and L defined as above, m = nk 
and 6 = 0. Projections on V are highly separable and thus can be computed 
easily. The primal objective can be evaluated in closed form using (99). 



24 



5.5 Optimality 



If fo and / can be explicitly computed, any v € V provides an optimality 
bound on the primal objective, with respect to the optimal value /(«*), via the 
numerical primal-dual gap f{u) — 

0^f{u)-f{u*)^f{u)~fD{v). (100) 

Assuming / and /^j can be evaluated, the gap is a convenient stopping criterion. 
To improve the scale invariance, it is often practical to stop on the relative gap 

m_fD{vi 

Jd{v) 

instead, which gives a similar bound. However convergence in the objective 
alone does not necessarily imply convergence in u, as the minimizer of the 



original problem ( 86 ) is generally not unique. This stands in contrast to the 
well-known ROF-type problems |ROF92j , where the uniqueness is induced by a 
quadratic data term. 

For some applications, after solving the relaxed problem a discrete solution 
- or "hard" labeling - still needs to be recovered, i.e. the relaxed solution 
needs to be binarized. For the continuous two-class case with the classical 
TV regularizer, |UEN06j showed that an exact solution can be obtained by 
thresholding at almost any threshold. However, their results do not seem to 
transfer to the multi-class case. 

Another difficulty lies in the discretization: In order to apply the thresh- 
olding theorem, a crucial "coarea"-like property must hold for the discretized 
problem |CD09| . which holds for the graph-based pairwise £i-, but not the 
higher order £2-discretization of the TV. Thus, even in the two-class case, sim- 
ple thresholding may lead to a suboptimal discrete solution. 

Currently we are not aware of an a priori bound on the error introduced by 
the binarizatio n ste p in the general case. In practice, any dual feasible point 



together with ( 100 1 yields an a posteriori optimality bound: Let 
a pair of primal resp. dual feasible iterates, m^^^ the result of the binarization 
step applied to u^^\ and u* the optimal discrete solution. Then m(^) is primal 
feasible, and its suboptimality with respect to the optimal discrete solution is 
bounded from above by 

/(^l(^)) - f{u*) < /(uW) - f{u*) < /(SW) - /z,(t;W) . (102) 

Computation of fo, and in many cases also /, is efficient as outlined in Sect. |5.2"| 



5.6 Improving the Binarization 

There seems to be no obvious best choice for the binarization step. The simplest 
choice is the first-max approach: the label i(x) is set to the index of the first 
maximal component of the relaxed solution u'^-'^\x). However, this might lead 
to undesired side effects: Consider the segmentation of a grayscale image with 
the three labels 1,2,3 corresponding to the gray level intensity, together with 
the linear distance d{i,j) = \i — j\, which is exactly representable using the 
Euclidean distance approach with A = (12 3). Assume there is a region where 
tt(^) = (1/3 -I- S{x), 1/3, 1/3 - S{x)) for some smah noise S{x) e M. The most 
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"natural" choice given the interpretation as grayscale values is the constant 
labeling £(x) = 2. The first-max approach gives i(x) G {1, 3}, depending on the 
sign of S{x), which leads to a noisy final segmentation. 

On closer inspection, the first-max approach - which works well for the Potts 
distance - corresponds to choosing 

£{x) = arg min - e^|l , (103) 
^€{1,...,;} 

with the smallest £ chosen in case of ambiguity. We propose to extend this to 
non-uniform distances by setting 

e{x) = arg min ^ (u{x) - e^) , (104) 
ee{i,...:i} 

§ : M' ^ M,§(z) ^ (e^^^) • 

That is, we select the label corresponding to the nearest unit vector with respect 
to ^ (note that instead of we could choose any normalized vector as ^E" is 
assumed to be rotationally invariant). In fact, for the linear distance example 
above we have ^{z) = \ — zi + z^]. Thus 

^{u{x)-e^) = \l~2S{x)\, (105) 
^{u{x)-e^) = \26{x)\, 
^{u{x)-e^) = \l + 26{x)\, 

and for small S we will get the stable and semantically correct choice i{x) — 2. 
This method proved to work well in practice, and considerably reduced the 



suboptimality introduced by the binarization step (Sect. 7.5). In case there is 
no closed form expression of it can be numerically approximated as outlined 
in Sect. EH 



6 Optimization 



When optimizing the saddle point problem (861, one must take care of its large- 
scale nature and the inherent nonsmoothness of the objective. While interior- 
point solvers are known to be very fast for small to medium sized problems, 
they are not particularly suited well for massively parallel computation, such as 
on the upcoming GPU platforms, due to the expensive inner Newton iterations. 

We will instead focus on first order methods involving only basic operations 
that can be easily parallelized due to their local nature, such as evaluations 
of L and and projections on C and V. The first two methods are stated 
here for comparison, the third one is new. It additionally requires evaluating 
(/ + L)^^ , which is potentially non-local, but in many cases can still be 
computed fast and explicitely using the Discrete Cosine Transform as shown 
below. 

The following approaches rely on computing projections on the full sets V 
resp. I'loc, which requires an iterative procedure for non-trivial constraint sets 



such as obtained when using the local envelope method (Sect. 6.4). We would 
like to mention that by introducing further auxiliary variables and a suitable 
splitting approach, these inner iterations can also be avoided |LBS10] . However 
the number of additional variables grows quadratically in the number of labels. 
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Algorithm 2 FPD Multi-Class Labeling 



1: Choose w(0) e M"><',w(0) e R^ixdxl^ 

2: Choose Tp > 0, Ti3 > 0, AT e N. 

3: for fc = 0, . . . , iV - 1 do 

4: t;(fc+i)^ni,(t>«+Ti3(LuW-6)). 

5, ^ He (uC^) - TP (L^vC^+i) + 5)). 

7; end for 



therefore the approach is only feasible for relatively small numbers of labels and 
memory requirements are usually one to several orders of magnitude higher than 
for the approaches discussed next. 

6.1 Fast Primal-Dual Method 



One of the most straightforward approaches for optimizing ( 86 ) is to fix small 
primal and dual step sizes Tp resp. td, and alternatingly apply projected gradi- 
ent descent resp. ascent on the primal resp. dual variables. This Arrow- Hurwicz 
approach |AHU64j was proposed in a PDE framework for solving the two-class 
labeling problem in |AT06| and recently used in [CCP08| . An application to 
denoising problems can be found in [ZCOSj . However it seems nontrivial to de- 
rive sufScient conditions for convergence. Therefore in [PCBC09a| the authors 
propose the Fast Primal-Dual (FPD) method, a variant of Popov's saddle point 
method |Pop80| with provable convergence. The algorithm is summarized in 
Alg.H 

Due to the explicit steps involved, there is an upper bound condition on 
the step size to assure convergence, which can be shown to be TpTo < 
[PCBCOQa] . The operator norm can be bounded according to 

||L|| II Gradll 2y/d (106) 

for the envelope method, and 

||L|| sC II GradllPII sC 2vQ\\A\\ (107) 

for the Euclidean metric method. As both the primal and dual iterates are 
always feasible due to the projections, a stopping criterion based on the primal- 



dual gap as outlined in Sect. 5.5 can be employed 



6.2 Nesterov Method 

We will provide a short summary of the application of Nesterov's multi-step 



method |Nes04j to the saddle point problem (86) as proposed in [LBS09| . In 

contrast to the FPD method, it treats the nonsmoothness by first applying 
a smoothing step and then using a smooth constrained optimization method. 
The amount of smoothing is balanced in such a way that the overall number of 
iterations to produce a solution with a specific accuracy is minimized. 

The algorithm has a theoretical worst-case complexity of 0{l/e) for find- 
ing an e-optimal solution, i.e. f{u'^^^) — f{u*) ^ e. It has been shown to 
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Algorithm 3 Nesterov Multi-Class Labeling 



1; Let ci e C, C2 e I? and ri,r2 G M such that C C Br^ici) and 2? C Sr2(c2); 
IILII. ■ 



Choose e C and iV e N. 
Let 1^^. 
Set G(-i) ^ 0,w(-i) ^ 0. 
for fc = 0, . . . , iV do 

1/ ^ Hi, (c2 + i (La;^^) - b) 

^ik) ^ UjC^-l) + (fc + 

. 2 (fc) 

" ^ (fe+l)(fe+2) ■ 

G ^ s + L'^V. 



2: 
3: 
4: 
5: 
6: 

7: 
8: 

9: 
10: 

11: ^ He (^X^*^) - ^G1. 

12: zC^) ^ He (ci - ppGC-' 

13: ^ ^3-^'=) + (l - ^a) 

14: end for 



give accurate results for denoising |Auj08| and general ^i-norm based problems 
[WABFOTl [BBCQ9, . Besides the desired accuracy, no other parameters have to 
be provided. The complete algorithm for our saddle point formulation is shown 
in Alg.[3} 

The only expensive operations are the projections lie ^nd Tlv, which are 
efficiently computable as shown above. The algorithm converges in any case 
and provides explicit suboptimality bounds: 

Proposition 8. In Alg.^ the iterates u^'^\v^^^ are primal resp. dual feasible, 
i.e. u'-'"'-' G C, v'^'^^ e T). Moreover, for any solution u* of the relaxed problem 
(86), the relation 

f{u^^^) - fin*) < /(uW) - /^(«(^)) < (108) 
holds for the final iterates u^-^^ ,v'--^) . 

Proof Apply jNes04[ Thm. 3] with the notation f{u) = {u,s), L, = 

(6,u), di{u) := i||it-ci|p, d2iv) := - C2IP, Di = ^rj, D2 = ^rf, ai = 

(72 = I, A/ = 0. □ 

Corollary 1. For given e > 0, applying Alg.^^wiih 

N = \2rir2Ce-^ - l] (109) 



yields an e-optimal solution of {86), i.e. f{u^^^) — f{u*) ^ e. 



For the discretization outlined in Sect. 5.2 we choose ci = je and ri — 
y^n{l — l)/l, which leads to the following complexity bounds to u'-^^ with re- 
spect to the suboptimality e. 
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• Envelope method (95). From the previous remarks, C :~ 2\fd ^ 



Moreover C2 = and by Prop. 10 (see Appendix), we have Pj^^ C Sq^(O) 
with 




and thus ri = a^ypn. The total complexity in terms of the number of 
iterations is 



0{e-^n^^dad) . (Ill) 



Euclidean metric method {96). Here we may set C = 2'\/(i||^||, C2 = and 



"^2 = \pn for a total complexity of 

0{e-^n4d\\A\\) . (112) 

We arrive at a parameter-free algorithm, with the exception of the desired sub- 
optimality bound. From the sequence (vS^\v'^^)') we may again compute the 
current primal-dual gap at each iteration. As a unique feature, the number of 
required iterations can be determined a priori and independently of the variables 
in the data term, which could prove useful in real-time applications. 

6.3 Douglas-Rachford Method 

We demonstrate how to apply the Douglas-Rachford splitting approach |DR56| 
to our problem. By introducing auxiliary variables, we can again reduce the 
inner steps to projections on the sets C and T). This is in contrast to a more 



straightforward splitting approach such as (LKY+OQ] . where the inner steps 
require to solve ROF-type problems that include a quadratic data term. 

Minimizing a proper, convex, lower-semicontinuous (Isc) function / : X — >■ 
M U {±00} over a finite dimensional vector space X := M" can be regarded as 
finding a zero of its - necessarily maximal monotone [ RW04j - subdifferential 
operator T := df : X 2^ . In the operator splitting framework, Of is assumed 
to be decomposable into the sum of two "simple" operators, T = A + B, of which 
forward and backward steps can practically be computed. Here, we consider 
the (tight) Douglas-Rachford- Splitting iteration jDR56..LM79j with the fixpoint 
iteration 

= (J,^(2J,B-/) + (/-J,B ))(««), (113) 

where J^-t '■= {I + tT)~^ is the resolvent of T. Under the very general con- 
straint that A and B are maximal monotone and A -\- B has at least one zero, 
the sequence (■u''^') is uniquely defined and will converge to a point u, with 
the additional property that u :— J^g{u) is a zero of T |Eck89[ Thm. 3.15, 
Prop. 3.20, Prop. 3.19]. In particular if / = /i + /2 for proper, convex, Isc fi 
such that the relative interiors of their domains have a nonempty intersection, 

ri(dom/i) nri(dom/2) 7^ , (114) 
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it follows |RW04l Cor. 10.9] that df ^ dfi + a/2, and A := dfi, B := 8/2 are 
maximal monotone. As a; € JrOfiiu) ^ x = argmin{(2T)^"'^||a;— 2/||2+/j(2;)}, the 
computation of the resolvents reduces to proximal point optimization problems 
involving only the fi . However, for straightforward splittings of ( 86 1 , such as 

minmax (7 (u, w) + , (115) 

evaluating the resolvents requires to solve ROF-type problems jLKY+09 , which 
is a strongly non-local operation and requires an iterative procedure, introducing 
additional parameters and convergence issues. Instead, we follow the procedure 
in [EB92[ ISet09b| of adding auxiliary variables before splitting the objective in 
order to simplify the individual steps of the algorithm. We introduce w ~ Lu 
and split according to 

minmax(u, s) + (Lu, w) — (5, w) (116) 
= min{u, s) + (Jx){Lu — b) + 6c{u) (117) 

u 

= min h{u, w) := 5Lu=wiu,w) + {u, s) + 5c{u} + aviw - b) . (118) 

U,W V ^ / V ^ / 

We apply the tight Douglas-Rachford iteration to this formulation: Denote 

(uW,u;('=)) J,9B(iI^'\«)('=)), (119) 

= J,^(2u('=) 2u;('=) -wW). (120) 

Then (tiC'+i), w(*^+i)) {u'-^'^ + m'W - + w"-^'^ - w^''^), according to 

( |113[ ). Evaluating the resolvent Jtb is equivalent to a proximal step on /12; 
moreover due to the introduction of the auxiliary variables it decouples, 



u 



w 



'C^) = argmin{i||u'-(2uW-u('=))+rs||2+,5c(ii')} 
u' 2 

= He ((2u(''') -u('^))-Ts) , (121) 

= argmin{;^||w'-(2w('=)-w(^))||^+ap(w'-6)} 
w' 2t 

= (2w('=)-'u;('=))-Tnp (^^(2w('=)-'u;('=)-6)^ . (122) 

In a similar manner, Jt-a resp. the proximal step on hi amounts to the least- 
squares problem 

= argmin{;^ (\\u- u'^^'^Wj + \\w ~ iD^jp) + 6Lu=^iu,w)}. (123) 

u.w ZT \ / 

Substituting the constraint w^''^ = Lm*^*^) yields 

uC'-) = a.Tgmm{\\u - u^'^^ll + \\Lu - w^''^\^} 

U 

= {I + L)-^ [u'^^^ + w^'^^y (124) 



30 



Algorithm 4 Douglas-Rachford Multi-Class Labeling 



1: Choose m(0) e M"^',lS(0) g ^nxdxl (qj. ggi- ^(0) ^ 

2: Choose T > 0. 

3: fc 0. 

4; while (not converged) do 

5: uf*^) 4- He (uC^' -Ts). 

6: u;"W ^ Hp (i(iDW -6)). 

8: w'C^) ^ Lu'^'^). 

9: ^ y(fe) 

10: w^^+l) 4- w'C^') +TW"('=). 

11; A; 4- fc + 1. 

12; end while 



Finally, Alg.jijis obtained by substituting w"^^'> := H-p {^{2w^'''> - w^^) - 6)). 

Solving the linear equation system ( 124 1 can often be greatly accelerated by 
exploiting the fact that under the forward difference discretization with Neu- 
mann boundary conditions, grad^ grad diagonalizes under the discrete cosine 
transform (DCT-2) [StiMl ILKY+OQ] : 

grad^ grad = B"^ diag(c)S (125) 

where B is the orthogonal transformation matrix of the DCT and c is the vector 
of eigenvalues of the discrete Laplacian. In both approaches presented above, L 
is of the form L = A(E) grad for some (possibly identity) matrix A e ] 
First, compute the decomposition A — V^^ dia,g{a)V with a e 
orthogonal matrix V € M'""', V'^ = . Then 



and an 



(1- 



(/ -I- diag(a) (g) diag(c))"^ (/; (x) B) {V < 



(126) 



(see Appendix for the proof). Thus step 124 can be achieved fast and accu- 
rately through matrix multiplications with V, discrete cosine transforms, and 
one 0{nl) product for inverting the inner diagonal matrix. 

In addition to convergence of the primal iterates (m^*''-'), it can be shown that 
the sequence {w"'^^^) from Alg. [Z] actually converges to a solution of the dual 
problem: 



Proposition 9. Let C, V be bounded, closed and convex sets with ri(C) ^ 
anrf ri(I?) ^ 0. Then Alg. ^generates a sequence of primal/ dual feasible pairs 
C X T) such that, for any saddle point {u*,v*) of the relaxed 



(mW,^"^^)) 



problem (86), 



fiu^^y) 

foiw"^"^) 



f{u*)^fDiv*), 
foiv*)^f{u*). 



Moreover, 



f{u^''^)~fiu*) ^ f{u^'''^)-fD{w"^''^) 



(127) 
(128) 

(129) 



provides an upper bound for the suboptimality of the current iterate. 
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Proof. See Appendix. □ 
Thus the Douglas-Rachford approach also allows to use the primal-dual gap 

f{u^''^)-fDiw"^'''^) (130) 

as a stopping criterion. 

Very recently, a generalization of the FPD method |PCBC09a] has been 
proposed |CP10) . The authors show that under certain circumstances, their 
method is equivalent to Douglas-Rachford splitting. As a result, it is possible 
to show that Alg. |4] can alternatively be interpreted as an application of the 
primal-dual method from [CPlOj to the problem formulation (1181. 

6.4 Projection on the Dual Constraint Set 

For the Euclidean metric approach, projection on the unit ball Pf^^ and thus 
on V is trivial: 

ni'L(^) ^ 1 ^, otherwise. (^^l) 

Projection on Vf^^ for the general metric case is more involved. We represent 
■^loc intersection of convex sets, 

vi^^nns, n:^{ve m''^'|^w* = o}, (132) 

i 

Since IIt^ is amounts to a translation along e = (l,...,l), and S is translation 
invariant in the direction of e, the projection can be decomposed: 

IlT,.Jv)=nn{Tls{v)). (133) 

We then follow the idea of jCCPOSj to use Dykstra's method [BD86' for iter- 
atively computing an approximation of II^ using only projections on the indi- 
vidual sets 5''-'. However, any recent multiple-splitting method could be used 
[CP081 iGMOQj . In our case, Hgij can be stated in closed form: 

U..AV) = I \\V' -V^\\i^d{i,j), 

1^ {w^,...,w^), otherwise, 

where 

{^;^ k^{i,j}, 
v^- ^^'-t'^^'' k = ^, (135) 

The complete Dykstra method for projecting a vector v onto S is outlined in 
Alg. [5l While th e sequence of y may be unbounded, x converges to 115(7;) 
(cf. |GM89llXnQ0 ]'). 



32 



Algorithm 5 Dykstra's Method for Projecting onto the Intersection of Convex 

Sets 

1: Associate with each (i, j), i < j a. unique index t e {1, . . . , k}, k :— 1(1 — 1). 

3: z/\...,y'= -f- e R'^'^K 

4: while [x not converged) do 

5: for t = 1, . . . , fc do 

6: n5(.,,,,)(x + ?;*). 

7: ^ X + y* — x'. 

8: X ^ x' . 

9; end for 
10; end while 



6.5 Computing the Objective 

For computing the primal-dual gap and the binarization step, it is necessary to 
evaluate the objective for a given u. Unfortunately, in the general case this is 
nontrivial as the objective's integrand ^ is defined implicitly as 

= max(z,t;). (136) 

We exploit that Hp can be computed and use an iterative gradient-projection 
method, 



V 



ni,(wW+rz) (137) 



for some r > 0, starting with v^^^ = z. As the objective in (136) is linear, T>f^^ is 
bounded and thus \1/ is bounded from above, convergence follows |Gol641 ILP66| 
for any step size r |WX04) : 

lim = 5'(z). (138) 

There is a trade-off in choosing the step size, as large r lead to a smaller number 
of outer iterations, but an increased number of nontrivial operations in the 
projection. We chose t = 2, which worked well for all examples. It is also 
possible to use any of the other nonsmooth optimization methods presented 
above. 



7 Experiments 

Regarding the practical performance of the presented approaches, we focused 
on two main issues: convergence speed and tightness of the relaxation. We will 
first quantitatively compare the presented algorithms in terms of runtime and 
the number of inner iterations, and then provide some results on the effect of 
the Euclidean metric vs. the envelope regularization. 

The algorithms were implemented in Matlab with some core functions, such 
as the computation of the gradient and the projections on the dual sets, imple- 
mented in C-I-+. We used Matlab's built-in FFTW interface for computing the 
DCT for the Douglas-Rachford approach. All experiments were conducted on 
an Intel Core2 Duo 2.66 GHz with 4 GB of RAM and 64-bit Matlab 2009a. 
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Figure 8: Synthetical "four colors" input image for the performance tests. Top 
row: Input image with Gaussian noise, a = 1; local labeling without regular- 
izer. Bottom row: Result with the Potts regularizer and Douglas-Rachford 
optimization; ground truth. 



7.1 Relative Performance 

To compare the convergence speed of the three different approaches, we com- 
puted the relative primal-dual gap at each iteration as outlined in Sect. |6] As 
it bounds the suboptimality of the current iterate (see Sect. 5.5), it constitutes 
a reliable and convenient criterion for performance comparison. 

Unfortunately the gap is not available for the envelope method, as it requires 
the primal objective to be evaluatable. Using a numerical approximation such 
as the one in Sect. |6.5| is not an option, as these methods can only provide a 
lower bound for the objective. This would lead to an underestimation of the 
gap, which is critical as one is interested in the behavior when the gap is very 
close to zero. Therefore we restricted the gap-based performance tests to the 
Euclidean metric regularizer. 

In order to make a fair comparison we generally analyzed the progression of 
the gap with respect to computation time, rather than the number of iterations. 

For the first tests we used the synthetical 256 x 256 "four colors" input 
image (Fig. [8| . It represents a typical segmentation problem with several ob- 
jects featuring sharp corners and round structures above a uniform background. 
The label set consists of three classes for the foreground and one class for 
the background. The image was overlaid with iid Gaussian noise with a ~ 1 
and truncated to [0, 1] on all RGB channels. We used a simple £^ data term, 
Si{x) = \\g{x) — where g{x) € [0, 1]'^ are the RGB color values of the input 
image in x, and is a prototypical color vector for the i-th class. 

The runtime analysis shows that FPD and Douglas-Rachford perform simi- 
lar, while the Nesterov method falls behind considerably in both the primal and 
the dual objective (Fig. [9]). 

The picture changes when considering the gap with respect to the number 
of iterations rather than time, eliminating influences of the implementation and 
system architecture. To achieve the same accuracy, Douglas-Rachford requires 



only one third of the iterations compared to FPD (Fig. 10 1. This advantage does 
not fully translate to the time-based analysis as the DCT steps increase the per- 
step computational cost significantly. However in this example the projections 
on the sets C and T> were relatively cheap compared to the DCT. In situations 
where the projections dominate the time per step, the reduced iteration count 
can be expected to lead to an almost equal reduction in computation time. 
One could ask if these relations are typical to the synthetical data used. 
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Figure 9: Convergence speed for the "four colors" image in Fig.[8j Left: Primal 
(upper) and dual (lower) objectives vs. computation time for the (from top to 
bottom) Nesterov, Fast Primal-Dual (FPD) and Douglas-Rachford methods. 
Right: Detailed view of the FPD and DR methods. The primal and dual 
objectives provide upper and lower bounds for the objective of the true optimum. 
Douglas-Rachford and FPD perform equally, while the Nesterov method falls 
behind. 
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Figure 10: Primal-Dual gap for Fig. [9] with respect to time and number of 
iterations. Top: Primal-Dual gap vs. time and number of iterations. The 
Nesterov method (top) again falls behind, while FPD (center) and Douglas- 
Rachford (bottom) are equally fast. Bottom: Primal-Dual gap vs. number of 
iterations. The Douglas-Rachford method requires only one third of the FPD 
iterations, which makes it suitable for problems with expensive inner steps. 
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■ fpd 

■ nesterov 



Figure 11: Objectives and primal-dual gap for the real- world leaf image in 
Fig. |4] with 12 classes and Potts potential. Left: Primal (upper) and dual 
(lower) objectives vs. time. The Nesterov method (top) falls behind the FPD 
(center) and Douglas-Rachford (bottom) methods. Right: Gap vs. number 
of iterations. As with the synthetical four-colors image (Fig. 10), the Douglas- 
Rachford approach reduces the number of required iterations by approximately 
a factor of 3. 



However we found them confirmed on a large majority of the problems tested. 
As one example of a real- world example, consider the "leaf" image (Fig. |4]). We 
computed a segmentation into 12 classes with Potts regularizer, again based on 
the £^ distances for the data term, with very similar relative performance as for 



the "four colors" image (Fig. 11) 



7.2 Number of Variables and Regularization Strength 

To examine how the presented methods scale with increasing image size, we 
evaluated the "four colors" image at 20 different scales ranging from 16 x 16 to 
512 X 512. Note that if the grid spacing is held constant, the regularizer weights 
must be scaled proportionally to the image width resp. height in order to obtain 
structurally comparable results, and not mix up effects of the problem size and 
of the regularization strength. 

In order to compensate for the increasing number of variables, the stopping 



criterion was based on the relative gap ( 101 ). The algorithms terminated as soon 
as the relative gap fell below 10""*. The Nesterov method consistently produced 
gaps in the 10"'^ range and never achieved the threshold within the limit of 2000 
iterations. Douglas-Rachford and FPD scale only slightly superlinearly with the 
problem size, which is quite a good result for such comparatively simple first- 



order methods (Fig. 12 1. 

While we deliberately excluded influences of the regularizer in the previous 
experiment, it is also interesting to examine how algorithms cope with varying 
regularization strength. We fixed a resolution of 256 x 256 and performed the 
same analysis as above, scaling the regularization term by an increasing sequence 



of A in the [0.1,5] range (Fig. 13) 



Interestingly, for low regularization, where much of the noise remains in the 
solution, FPD clearly takes the lead. For scenarios with large structural changes, 
Douglas-Rachford performs better. We observed two peaks in the runtime plot 
which we cannot completely explain. However we found that at the first peak, 
structures in the image did not disappear at in parallel but rather one after the 
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50000 100000 150000 200000 250000 

Figure 12: Computation time for increasing problem size for the Douglas- 
Rachford (top, dark) and FPD (bottom, light) methods. Shown is the time 
in seconds required to achieve a relative gap of lO"**. The computational effort 
scales slightly superlinearly with the number of pixels. The Nesterov method 
never converged to the required accuracy within the hmit of 2000 iterations. 




Figure 13: Computation time for varying regularization strength A for the 
Douglas-Rachford (top, dark) and FPG (bottom, light) methods. The images 
at the bottom show the final result for the A above. FPD is strong on low 
regularization problems, while Douglas-Rachford is better suited for problems 
with large structural changes. The Nesterov method never achieved the relative 
gap of 10~^ within 2000 iterations. 

other, which might contribute to the slower convergence. Again, the Nesterov 
method never achieved the required accuracy. 

7.3 Breaking Points 

We have no clear explanation why the Nesterov method appears to almost 
always fall behind. However it is possible to compare its behavior with the 
a priori bound from Prop. [8] By inspecting the derivations in the original work 
[Nes04j . it can be seen that exactly one half of the final bound comes from the 
smoothing step, while the other half is caused by the finite number of iterations: 

<5total = (^smooth + (^itor, whcrC (^smooth — <5itor ■ (139) 
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Figure 14: Theoretical vs. practical performance of the Nesterov method for 
Fig.[9j As expected, the method stays below the theoretical per-iteration bound 
'^totai (dashed). At the final iteration, the worst-case total bound Stotai (dotted, 
top) is outperformed by a factor of 7, which implicates that the error introduced 
by smoothing is also well below its worst-case bound ^smooth (dotted, bottom). 



Moreover, ^itor decreases with l/{k + 1) , which gives an a priori per-iteration 
suboptimality bound of 

•^total = '^smooth + Sit.r ■ (MO) 

On the "four colors" image, the actual gap stays just below S^^l^i in the beginning 



(Fig. 14 1. This implies that the theoretical suboptimality bound can hardly be 
improved, e.g. by choosing constants more precisely. Unfortunately, the bound 
is generally rather large, in this case at ^totai = 256.8476 for 2000 iterations. 
While the Nesterov method outperforms the theoretical bound (Jtotai by a factor 
of 2 to 10 and even drops well below the worst-case smoothing error ^smooth, 
it still cannot compete with the other methods, which achieve a gap of 0.3052 
(FPD) resp. 0.0754 (Douglas-Rachford). 

There is an interesting extreme case where the Nesterov method seems to 
come to full strength. Consider the noise-free "triple point" inpainting problem 



(Fig. 15 1. The triple junction in the center can only be reconstructed by the 
Potts regularizer, as the £^ data term has been blanked out around the center. 
By reversing the sign of the data term, one obtains the "inverse triple point" 
problem, an extreme case that has also been studied in |CCP08j and shown to 
be an example where the relaxation leads to a strictly nonbinary solution. 

On the inverse problem, the Nesterov method catches up and even surpasses 
FPD. This stands in contrast with the regular triple point problem, where all 
methods perform as usual. We conjecture that this sudden strength comes from 
the inherent averaging over all previous gradients (step [t] in Alg. [3]): in fact, on 
the inverse problem Douglas-Rachford and FPD display a pronounced oscillation 
in the primal and dual objectives, which is accompanied by slow convergence. 
In contrast, the Nesterov method consistently shows a monotone and smooth 
convergence. 
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Figure 15: Primal and dual objectives for the triple point (top) and inverse 
triple point (bottom) inpainting problems. Left to right: Input image with 
zeroed-out region around the center; relaxed solution; binarized solution; primal 
(upper) and dual (lower) energies vs. time. The triple junction in the center 
has to be reconstructed solely by the Potts regularizer. The inverse triple point 
problem exhibits a strictly nonbinary relaxed solution. For the inverse triple 
point, Douglas- Rachford (bottom) and FPD (center) show an oscillatory be- 
havior which slows down convergence. The Nesterov approach (top) does not 
suffer from oscillation due to the inherent averaging, and surpasses FPD on the 
inverse problem. 
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Figure 16: Performance on the "four colors" image with Potts interaction po- 
tential and the envelope regularizer. Left: Dual objectives of for Douglas- 
Rachford (top) and FPD (bottom) vs. time. The reduced iteration count of 
the Douglas-Rachford method becomes more apparent in the time plot as the 
time per iteration is now dominated by the projection rather than the DCT. 
Right: Time per iteration for Nesterov (top), Douglas-Rachford (center) and 
FPD (bottom). The Nesterov method fails to converge as it accumulates er- 
rors from the approximate projections, which in turn leads to slower and more 
inexact projections. 
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7.4 Performance for the Envelope Relaxation 



Undoubtedly, the difficulty when using the envelope based regularizer comes 
from the slow and inexact projection steps which have to be approximated 
iteratively. Therefore we re-evaluated the "four colors" benchmark image with 
the envelope regularizer. The iterative Dykstra projection (Alg. [s]) was stopped 
when the iterates differed by at most 5 = 10~^, with an additional limit of 50 
iterations. While the gap cannot be computed in this case, the dual objective 
can still be evaluated and provides an indicator for the convergence speed. 

We found that in comparison to the Euclidean metric regularizer from the 
previous examples, the margin between FPD and Douglas-Rachford increases 
significantly. This is consistent with the remarks in Sect. |7.1[ the lower iter- 
ation count of the Douglas-Rachford method becomes more important, as the 
projections dominate the per-iteration runtime (Fig. 16). 

Surprisingly the Nesterov method did not converge at all. On inspecting 
the per-iteration runtime, we found that after the first few outer iterations, the 
iterative projections became very slow and eventually exceeded the limit of 50 
iterations with 5 remaining between 2 and 5. In contrast, 20 Dykstra iterations 
were usually sufficient to obtain 5 = 10^^ (Douglas-Rachford) resp. 5 = 10"^""^ 
(FPD). 

We again attribute this to the averaging property of the Nesterov method: 
as it accumulates the results of the previous projections, errors from the inexact 
projections build up. This is accelerated by the dual variables quickly becoming 
infeasible with increasing distance to the dual feasible set, which in turn puts 
higher demands on the iterative projections. Douglas-Rachford and FPD did 
not display this behavior and consistently required 5 to 6 Dykstra iterations 
from the first to the last iteration. 



7.5 Tightness of the Relaxations 

Besides the properties of the optimization methods, it is interesting to study 
the effect of the relaxation - i.e. Euclidean metric or envelope type - on the 
relaxed and binarized solutions. 

To get an insight into the tightness of the relaxations, we used the Douglas- 
Rachford method to repeat the "triple point" inpainting experiment in |CCP08| 



with both relaxations (Fig. 17). Despite the inaccuracies in the projections, 
the envelope regularizer generates a nearly binary solution: 97.6% of all pixels 
were assigned "almost binary" labels with an distance of less than 0.05 to 
one of the unit vectors {e^,...,e'}. For the Euclidean metric method, this 
constraint was only satisfied at 88.6% of the pixels. The result for the envelope 
relaxation is very close to the sharp triple junction one would expect from the 
continuous formulation, and shows that the envelope relaxation is tighter than 
the Euclidean metric method. 

However, after binarization both approaches generate almost identical dis- 
crete results. The Euclidean metric method was more than four times faster, 
with 41.1 seconds per 1000 iterations vs. 172.16 seconds for the envelope relax- 
ation, which required 8-11 Dykstra steps per outer iteration. 

While the triple point is a problem specifically designed to challenge the 
regularizer, real-world images usually contain more structure as well as noise, 
while the data term is available for most or all of the image. To see if the above 
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Figure 17: Tightness of the relaxation. Top row: In the input (left), the data 
term was blanked out in a quadratic region. All structure within the region is 
generated purely by the regularizer with a standard Potts interface potential. 
The envelope relaxation is tighter and generates a much more "binary" solu- 
tion (center) than the Euclidean metric method (right). Bottom row: After 
binarization of the relaxed solutions, the envelope (left) and Euclidean metric 
(center) methods generate essentially the same solution, as can be seen in the 
difference image (right). The Euclidean metric method performed more than 
four times faster due to the inexpensive projections. 



results also hold under these conditions, we repeated the previous experiment 



with the "sailing" image and four classes (Fig. 18). The improved tightness of 
the envelope relaxation was also noticeable, with 96.2% vs. 90.6% of "almost 
binary" pixels. However, due to the larger number of labels and the larger image 
size of 360 x 240, runtimes increased to 4253 (envelope) vs. 420 (Euclidean 
metric) seconds. 

The relaxed as well as the binarized solutions show some differences but 
are hard to distinguish visually. It is difficult to pinpoint if these differences 
are caused by the tighter relaxation or by numerical issues: while the Douglas- 
Rachford method applied to the Euclidean metric relaxation converged to a final 
relative gap of 1.5 • 10^^, no such bound is available to estimate the accuracy of 
the solution for the envelope relaxation, due to the inexact projections and the 
intractable primal objective. 



7.6 Binarization and Global Optimality 

As an example for a problem with a large number of classes, we analyzed the 
"penguin" inpainting problem from jSZS^06j . We chose 64 labels corresponding 
to 64 equally spaced gray values. The input image contains a region where the 
image must be inpainted in addition to removing the considerable noise. Again 
the data term was generated by the £^ distance, which reduces here to the 
absolute difference of the gray values. In order to remove noise but not overly 
penalize hard contrasts, such as between the black wing and the white front, 
we chose a regularizer based on the truncated linear potential as introduced in 
Sect. 1131 

Due to the large number of labels, this problem constitutes an example 
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Figure 18: Effect of the choice of relaxation method on the real-world "sailing" 
image (image courtesy of F. Becker). Top row: Four-class segmentation using 
envelope (left) and Euclidean metric (right) methods. Shown are the solutions 
of the relaxed problem. Bottom row: Original image (left); difference im- 
age of the discretized solutions (right). While the envelope relaxation leads to 
substantially more "almost discrete" values in the relaxed solution, it also runs 
more than 10 times slower and does not provide a suboptimality bound. The 
generated solutions are visually almost identical. 



where the Euclidean metric approach is very useful. As the complexity of the 
projections for the envelope relaxation grows quadratically with the number 
of labels, computation time becomes prohibitively long for a moderate amount 
of classes. In contrast, the Euclidean metric method requires considerably less 
computational effort and still approximate the potential function to a reasonable 
accuracy (Fig. [7|. 

In the practical evaluation, the Douglas-Rachford method converged in 1000 
iterations to a relative gap of 8.3 • 10~^, and recovered both smooth details 
near the beak, and hard edges in the inpainting region (Fig. 19). This example 
also clearly demonstrates the superiority of the improved binarization scheme 
proposed in Sect. |5.6[ As opposed to the first-max scheme, the improved scheme 
generated considerably less noise. The energy increased only by 2.78% compared 
to 15.78% for the first-max approach. 

The low energy increase is directly related to global optimality for the dis- 
crete problem: as the relaxed solution is provably nearly optimal, we conclude 
that the energy of the binarized solution must lie within 2.78% of the optimal 
energy for the original combinatorial problem ([l]) . Similar results were obtained 
for the other images: 5.64% for the "four colors" demo, 1.02% for the "leaf" 
image and 0.98% for the "triple point" problem. 

These numbers indicate that the relaxation seems to be quite tight in many 
cases, and allows to recover good approximations for the solution of the discrete 
combinatorial labeling problem by solving the convex relaxed problem. 



8 Conclusion and Further Work 

The present work provides a reference and framework for continuous multilabel- 
ing approaches. The presented algorithms are robust and fast and are suited for 
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Figure 19: Denoising/inpainting problem with 64 classes and the nontrivial 
truncated linear potential approximated by an Euclidean metric. Left to right: 
Noisy input image with inpainting region marked black |SZS"'"06] : result with 
first-max binarization; result with the proposed binarization scheme (104). The 
first-max method introduces noticeable noise in the binarization step. The pro- 
posed method takes into account the non-uniformity of the used "cut-linear" 
potential (Fig. [?]), resulting in a clean labeling and an energy increase of only 
2.78% vs. 15.78% for the first-max method. This shows that the obtained 
solution solves the originally combinatorial multiclass labeling problem to a 
suboptimality of 2.78%. 



massive parallelization. From the experiments it became clear that solving the 
convex relaxed problem allows to recover very good solutions for the original 
combinatorial problem in most cases. 

The performance evaluations showed that the Douglas-Rachford method 
consistently requires about one third of the iterations compared to the Fast 
Primal-Dual method. For low regularization and fast projections, FPD out- 
performs the Douglas-Rachford method. In all other cases, Douglas-Rachford 
performs equally or better, with a speedup of 2-3 if the projections are expen- 
sive. Overall, the proposed Douglas-Rachford method approach appears to be 
a solid all-round method that also handles extreme cases well. 

From the viewpoint of numerics, in our evaluations we did not specifically 
consider the effect of choosing different step sizes for the Douglas-Rachford 
method. Also, it seems as if the smooth optimization step in the Nesterov 
method usually performs much better than its theoretical bound. Adjusting 
the strategy for choosing the smoothing parameter could yield a faster overall 
convergence and possibly render the method competitive. In this regard, it 
would also be interesting to include the inexactness of the projections into the 
convergence analysis. There are also several theoretical questions left, such as 
how to include non-metric distances into the continuous formulation. 

In any case we think that the present framework unites continuous and 
discrete worlds in an appealing way, and hopefully contributes to reducing the 
popularity gap compared to purely grid-based, graph cut methods. 
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9 Appendix 

Proposition 10. Let v = {v^, . . . ,v^) G 'Df^^, then 

\\v\\ < mm(^d{i,j] 
j 

Proof of Prop. From the constraint 
arbitrary but fixed i £ {1, . . . ,1}, 



(141) 



in (701 we deduce, for 




-0 + l\\v' 



Y,{\\v^f~2{v\v^) + \y\ 
I I 



i\\2\ 



As i was arbitrary this proves the assertion. 



(142) 



□ 



Proof of Eq. (126|. We mainly use the fact that {P®Q){R®S) = {PR)®{QS) 



for matrices P, Q, R, S with compatible dimensions: 



[I + L^L)-^ 



I + {A® grad)^ {A ® grad) 
/ + {A^ A) ® (^grad^ grad) ) 



(143) 
(144) 
(145) 



(/ + {V-^ diag(a)F) (g) {B-^ diag(c)B))"^ 
(/ + {V-^ ® B-^) (diag(a) ® diag(c)) (V ® S))"^(146) 
{{V-^ ® B-^) {I + diag(a) ® diag(c)) {V ® S))"^(147) 
{V-^ ® B-^) {I + diag(a) (g> diag(c))"^ {V (g> B) . (148) 



Using = V^, p6| follows 



□ 

Proof of Prop. The idea of the proof is to show that the sequence (w''^*^^) is 
exactly t he m inimizing sequence produced by the algorithm applied to the dual 
problem ( |87| , with step size 1/t. Thus, if the dual algorithm converges, (w"'^'^^) 
converges to the solution of the dual problem, which proves the assertion. To 



44 



Algorithm 6 Dual Douglas-Rachford for Multi-Class Labeling 

1: Choose e ^nxdxl^^iO) ^ jjnx/^ 

2: Choose td > 0. 

3: fc 0. 

4; while (not converged) do 

6: z"('=)^nc(;:^(z('=)-s)). 

7: -y'C^) (/ + ii^)"^ ((2f - vf'^)) + - 2tdz"('=))). 

8: z'W ^ (-L^)t-'('=). 

10: zC^+i) ^ z'W +TDx;"('=). 
11; k ^ k+1. 
12: end while 



show the equivalency, first note that the formulation of the primal problem from 

minmax(ii, s) + {Lu, v) — (6, v) (149) 

already covers the dual problem 

maxmin(u, s) + {Lu, v) — {b, v) 

veT) uec 

= mmmax{b,v) + {u,—L^v) — {u,s) (150) 
vev uec 

by the substitutions 

w o u, C o 2?, 6 o s, L o -L^ . (151) 

The dual problem can thus be solved by applying the above substitutions to 
Alg. |4] Additionally substituting w 4-> z in Alg. [4] in order to avoid confusion 
with iterates of the primal method, leads to the dual algorithm, Alg. [6] We 
first show convergence of Alg. |4] and Alg. |6] By construction, these amount to 
applying Douglas-Rachford splitting to the primal resp. dual formulations 

TaiiiSLu=wiu,w) + {u, s) + Sciu) + a-piw - b) , (152) 

— ■.hi(u,w) —:h2{u,w) 

mill S_ i^T ^^^{v, z) + {v,b) + Sv{v) + ac{z — s) . (153) 

V,Z ✓ V ' ✓ 

= :ho,i{v,z) =:hD,2(v,z) 

As both parts of the objectives are proper, convex and Isc, it suffices to show 
additivity of the subdifferentials |RW04i Cor. 10.9] |Eck89[ Thm. 3.15] |Eck89[ 
Prop. 3.20, Prop. 3.19] jEB92j . Due to the boundedness of C and V, 

ri (dom /la) n ri (dom hi) = (ri (C) x M"'*') n {Lu = w} 

= {{u,Lu)\ueri{C)} , (154) 

ri(dom/i£,^2)nri(dom/i£,a) = (ri(2?) x M"') n {-L^u = z} 

= {(w,-L^v)|w G ri(2?)}. (155) 
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Both of these sets are nonempty as ri(C) 7^ ^ This impHes additivity of 

the subdifferentials for the proposed objective |RW041 Cor. 10.9] and thus con- 
vergence (in the iterates as weh as the objective) of the tight Douglas-Rachford 
iteration (cf. |Eck89l Thm. 3.15]). 

We will now show that the primal and dual algorithms essentially generate 
the same iterates, i.e. from r := rp = 1/td, u^*^^ = r^^'^^ and w^*^^ = tv'^^\ it 
follows that = rzC^+i), w^'^+^^ = rv^k+i) ^(k) ^ ^n(k) ^ ^(k) ^ ^n{k) ^ 

The last two equalities follow immediately from the previous ones by definition 
of the algorithms. Furthermore, 

= v^^'^ + {I + LL^)-^ 

•((2«('=) - w^^') + {-L){z''^^ - 2t-1z"('=))) - (156) 
= T-i7D('=) + (/ + LL^)-i((2u;"W - r-^w^'^^) 

+ (-L)(r-iu('^) - 2r-iu('=))) - w'"^^^ (157) 
= r-ii<)('=) + (/ + LLT)-1(2u;"('=) - r-^w^'^^) - w"^''^ 

-(/ + LL'^)-^L{t~^u^''^ - 2t-\^'''>) . (158) 

By the Woodbury identity, {I+LL'^)^^ = I - L{I + Ly^ and in particular 
(/ + LL^y^L = L{I + L'^L)-\ therefore 

-L{I + L^L)-iL^(2u;"('=) - t~^w^'''>) - 

-L{I + L)-^{t-^u^^^ - 2t-^4^'^) (159) 



w 



"(fe) 



L{I + L^L)-^ 



•(L^(2u;"('=) - r-i^C^)) + {t-^u^^'^ - 2t-^u^^^^)) (160) 
= T-^L{I + L^L)-^ 

•((2m('=) - M^'^)) + i^(u;« - 2™"('=))) + (161) 

= r-i(Lu'W +Tw"W) = t-1id('=+i). (162) 

By primal-dual symmetry, the same proof shows that ■!2('^+^' = rz'^'^+^'. To con- 
clude, we have shown that i^''^^) = -yC^) for tjj = \/t and suitable initialization 
of the dual method. As the dual method was shown to converge, {w"'^'^^) must 
be a maximizing sequence for fjj. Together with the convergence of u'^^^ to a 



minimizer of /, this proves the first part of the proposition. Equation ( 129 ) 
follows, as for any saddle-point (u*,w*). 

Id {w"^''^) fD{v*) = f{u*) ^ /(^iW) (163) 

holds, and therefore /(«<'=)) - /(u*) ^ and f{u*) > fo {w"'-''^). □ 
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